home *** CD-ROM | disk | FTP | other *** search
- ======================
- Ch 8 − Using CSV files
- ======================
-
- 8.1 What are CSV files?
- ---------------------------
- CSV stands for “comma-separated values” and is the name given to files
- consisting of lines of data, each of which contains individual data items
- separated from each other by commas. Such files are widely used to
- mail-merge using a wordprocessor (see Ch 9) and also to transfer data from
- one application to another, e.g. from a database to a spreadsheet or from a
- RISC OS database to a PC database. We will refer to each line in a CSV file
- as a record and each item of data in such a line as a field since records
- and fields are the source and destination of such data when it is exported
- from or imported into a Powerbase database. The following points should be
- noted:
-
- (1) Fields which are non-numeric (e.g. plain text items such as names and
- addresses) are often enclosed in double quotes (“”) whereas numeric data is
- not. This makes it possible for an application reading a CSV file to
- distinguish between numbers and strings (which might of course contain
- numerals) and also allows a comma to be used as a character within a string
- without being mistaken for a data separator. (Addresses often contain commas
- e.g. 112, Keighley Road). For many purposes the quotes aren’t necessary and
- may be omitted.
-
- (2) Null fields are usually included and can be located by looking for two
- commas with either nothing in between them or with only two double quotation
- marks between them, i.e. ,, or ,“”,. Each record in such a file will always
- contain the same number of fields. Powerbase does, however, allow nulls to
- be omitted entirely when creating a file in which case the number of fields
- per record will vary.
-
- (3) The way in which each record is terminated varies from one system to
- another. The last field in a record is followed not by a comma but by a line
- terminator. On RISC OS systems this is the same as in ordinary text files:
- the linefeed character (LF, ASCII value 10) and on PCs it is usually the
- carriage-return character (CR, ASCII value 13). You might, however,
- encounter CSV files in which both these characters are used, i.e. LF CR or
- CR LF. Powerbase lets you define any character or pair of characters as the
- record terminator.
-
- (4) Separators other than commas are sometimes used. The Tab character
- (ASCII value 9) is often used and such files are called TSV (“tab-separated
- values”) files. Powerbase lets you define any character, or even a pair of
- characters, as the field separator. All such files created by Powerbase will
- be of type &dfe and display the CSV file icon (although the default
- filenames offered do differ: “CSVfile” where the separator really is a
- comma, “TSVfile” where it’s a TAB and “?SVfile” otherwise). In what follows
- we will, for convenience, refer to them all as “CSV files” whether or not
- the separator is a comma.
-
- (5) Some applications which accept a CSV file as input expect the first line
- to contain the names of the fields which comprise the subsequent records,
- e.g. if each record consists of a name and a four-part address this header
- record might read:
-
- “NAME”,“STREET”,“TOWN”,“COUNTY”,“POSTCODE”
-
- Powerbase can both export and import files with such a header.
-
- 8.2 Setting the CSV options
- -------------------------------
- Choosing CSV files => Options from the main menu displays the CSV options
- window which lets you specify all the file characteristics described
- earlier. Pop-up menus give you a choice of field-separators and
- record-terminators with space to enter your own if you wish. The first
- three option switches cause an exported file to have, respectively, the
- following characteristics when the switch is selected:
-
- • quotes round non-numeric fields − see (1) above
-
- • a header record specifying the field names as either tags or
- descriptors − see (5) above
-
- • null fields included − see (2) above
-
- Note that the field names referred to in a header record are, by default,
- the tags of the corresponding Powerbase fields, but may be changed to the
- descriptors by altering the setting in the Print options window.
-
- - 44 -
-
- 8.3 Exporting data as a CSV file
- ------------------------------------
- Having set up your options as described above, creating a CSV file is very
- like printing a list. First highlight the fields to be exported by clicking
- on each with ADJUST. Remember that the order in which the fields are
- highlighted is important. Next choose CSV files =>Export (Ctrl X). A window
- featuring the Query panel appears. Type in a search formula to determine
- which records are exported. Finally, enter the name of the file and drag the
- file icon to a filer window. By default the file is saved in PrintJobs as
- usual, and you may simply click on the Export button or type Return.
- Experiment with saving CSV files with different settings of the CSV options
- and then loading the resulting files into Edit to examine them. If you
- select the Reverse switch on the Query panel the CSV file will be created in
- reverse order. (See also 3.5.1).
-
- The field-concatenation option (see 3.2.2) applies. This means that data
- which occupies separate fields in the Powerbase record need not do so in the
- exported CSV file. By holding down Shift when selecting the field with
- ADJUST, the comma (or other separator) which would normally follow is
- suppressed until you select a field without using Shift. A slightly
- problematic situation occurs when you concatenate a mixture of numeric and
- non-numeric fields with the In quotes option selected. In such a case
- Powerbase will enclose the whole concatenated group within two sets of
- double quotes. e.g. NAME, Z, M and SYM from the Elements database would be
- exported, for actinium, as:
-
- “ACTINIUM 89 227 Ac”
-
- The Spacer (see 3.10) − in the above instance it is the default setting of
- one space − is used to separate the concatenated fields.
-
- 8.3.1 Scrollable lists and CSV files
-
- Scrollable fields may appear in a CSV file in two different ways, depending
- on how their printing option is set in the Print Options window. If As
- single row is selected the whole list is exported as a single CSV field. The
- data corresponding to individual rows of the list are separated by
- semicolons. If As columns is selected then each row of the list is exported
- as a CSV field.
-
- Transferring data from and to scrollable lists in individual records was
- covered in 2.6.5 and 2.6.6.
-
- 8.4 Using CSV files to import data
- --------------------------------------
- If you drop a CSV file on the record window of an open database the CSV
- options window (see 8.2) appears with the title changed to “Import CSV file”
- and some additional icons, one of which displays the pathname of the file.
- Clicking on Import will make Powerbase try to create new database records
- from the file. If you decide not to do this you should click on Cancel. If
- you do wish to import the data there are some important consideration which
- will now be explained.
-
- 8.4.1 Ensuring that the correct options are selected
-
- If the CSV file originally came from a Powerbase application the settings in
- the CSV options window need to be exactly the same as they were when the
- file was exported. The exception to this is the In quotes button which is
- shaded on import because Powerbase doesn’t need it. If the CSV file came
- from a PC or another RISC OS application you might have to load it into Edit
- to find out what separator and terminator are used. You will also probably
- need to set the filetype to &dfe (omit the “&”) as well.
-
- There are three more option switches at the bottom of the window. One causes
- each record to be displayed as it is imported. The import process is slower
- with this turned on but much more informative if you like to know how things
- are progressing. The second switch, when selected, strips any trailing
- spaces in the imported data-fields. Some database programs pad all fields to
- their maximum length by adding spaces to the end of the data where
- necessary. If you import such a file into Powerbase you will find that the
- caret will always be at the far right of the field even though visible
- characters do not fill the field, and some queries won’t work properly.
- Setting the Strip spaces switch before importing the file overcomes the
- problem. The third option switch determines how Sequence number fields are
- handled. If the switch is ON imported sequence numbers are ignored and new
- ones assigned in accordance with the field’s sequence number counter. With
- the switch OFF sequence numbers from the CSV file are imported without
- alteration.
-
- 8.4.2 Directing imported data to the correct fields
-
- If no fields on the record window are highlighted (i.e. with ADJUST) and the
- CSV file does not contain a header record (see 8.1) then the import process
- proceeds according to the following rules:
-
- - 45 -
-
- • The first CSV field will be read into the first Powerbase field for
- which importing is allowed, This means any which can hold text
- (including External fields) or numbers and also check-boxes Graphics
- fields, Buttons or fields which are merely labels will be ignored.
- The next CSV field will be read into the second Powerbase field and
- so on.
-
- • If the end of the CSV record is reached before all the relevant
- fields have been filled (data underflow) then the next CSV record
- will start a new Powerbase record, i.e. the reading won’t get out of
- step. It does not matter, therefore, if the CSV file omits null
- items at the end of a line.
-
- • If all relevant fields are filled before reaching the end of the CSV
- record (data overflow) Powerbase ignores the remainder of the line
- and skips to the beginning of the next CSV record before starting a
- new Powerbase record. This is also to keep the operation in step.
-
- There will be occasions when you don’t want to fill the Powerbase fields
- sequentially as just described. There are two ways of making the process
- more specific:
-
- (1) Highlight the required Powerbase fields with ADJUST before starting the
- import. Data will then be read only into the highlighted fields, all other
- fields being ignored. The order in which the fields are filled is the order
- in which you highlighted them. The rules given above about underflow and
- overflow of data still apply.
-
- (2) Give the file a header record containing the tags or descriptors of
- required fields in the Powerbase record. There is nothing to prevent you
- using Edit to add such a header to a CSV file which did not originate from
- a Powerbase application. The format of the header is illustrated in 8.1 (5).
- Importing then occurs just as if those fields were highlighted. The Print
- options window must reflect whether the tags or descriptors of fields are
- used. Do not use both a header and highlighting.
-
- 8.4.3 Importing data from plain text files
-
- It was explained in 8.1 (4) that files created as described above (8.3) can
- have separators other than a comma. If the file has been created using
- Powerbase’s CSV exporting facility it will have been given the file-type
- &dfe as if it was a true CSV file. If it comes from some other source it is
- likely to be of type &fff (Text). It can still be imported but caution is
- needed because there are other circumstances in which a text file might be
- dropped on the record window. A properly-written script file (see Ch 12)
- would be recognised as such and therefore cause no problem, but any text
- file dropped onto the appropriate type of External field (i.e. a Text or
- Text Block field) would become linked to that field instead of being treated
- like a CSV file. If you are importing data from a plain text file and your
- record contains fields of the aforementioned types be sure to drop the file
- on the window background, not on the External field The appropriate window
- will then appear with the title “Import text file”.
-
- You are strongly advised to use proper CSV files if at all possible. You
- could, in fact, use the filer to set the filetype of such non-standard files
- to &dfe (omit the “&”) so that the problem of Powerbase taking the wrong
- action doesn’t arise. Be sure to use Options to set the correct field
- separator and record terminator though.
-
- 8.4.4 What if the imported data won’t fit?
-
- There are two situations in which this can happen. The database might not
- contain enough free records to hold all the imported data and so you get a
- “Database full when reading CSV file” error. To avoid this either make sure
- the database is big enough before you start or place a suitable value in the
- Increment for expansion icon in the Change length window. The latter is
- accessible from the Utilities submenu of the iconbar menu (see 10.5).
-
- The second situation is where an item is too long for the destined database
- field. When importing data Powerbase maintains a file called TooBig inside
- the database’s PrintJobs directory. Anything which won’t fit in the target
- field is instead written to this file together with information about where
- it was intended to go and an “@” character is placed in the database field
- to draw your attention to it. No writable Powerbase field may be longer than
- 246 characters and if an item of imported data exceeds this a note will be
- made in the TooBig file advising you to define an External field (Text Block
- or Text) for such data. After completing a CSV import operation it is
- advisable to look at the TooBig file to see if any remedial action is
- needed.
-
- - 46 -
-
- 8.5 Using CSV files to modify existing records
- --------------------------------------------------
- Most database programs which support CSV import allow it to be used only for
- creating new records. Powerbase is unusual in that you can use a CSV file
- to modify existing records. This capability should be used with caution
- since careless use can irrevocably garble a database. There are three
- relevant radio buttons in the CSV options window. They are called Modify
- existing, With primary key and With rec. number and their actions are as
- follows:
-
- Modify existing No new records will be created when a CSV file is dropped
- onto the record window. The existing records will be accessed in the order
- determined by the current index and the new data will be merged into these
- records. You should, of course, either use ADJUST to highlight the fields
- into which the data is to go or place a header in the CSV file specifying
- the field tags and then turn on the With header switch (see 8.4.2). If all
- the records are modified before the end of the CSV file has been reached a
- warning message is displayed. Note that it is the user’s responsibility to
- ensure that the data in the CSV file is in the correct order since Powerbase
- has no way of telling which data is destined for which record and can only
- proceed sequentially. As an additional precaution a warning message is
- displayed when you choose this option.
-
- With primary key This affects both export and import. When you export data
- with this button selected each record of the CSV file includes the primary
- key of the Powerbase record. (Try creating a CSV file with and without this
- option set and compare the files using Edit.) When importing such a file
- Powerbase will attempt to locate records with the same primary keys as the
- records in the CSV file. If a matching key is not found a new record will be
- created to receive the data. If the key does exist then the CSV data goes
- into the same record, overwriting any data which the target fields already
- contain. The option is only useful for transferring data between Powerbase
- databases which have the same primary key field(s) and structure. Don’t
- forget that Powerbase, by default, allows duplicate primary keys so it is
- possible for the new data to go into the wrong record. You would be
- ill-advised to trust this method of import unless you are sure each primary
- key is unique.
-
- With record number This resembles the previous option. It allows data to be
- exported with the Powerbase record numbers included. Importing into another
- (or the same) database places the data in records having the same record
- numbers, again overwriting data which may already be present in the target
- fields. This option is only useful for transferring data between Powerbase
- databases in which corresponding records have identical record numbers. One
- use for it might be to export a set of data, load it into an editor and do
- some extensive searching-and-replacing, then put the modified data back into
- the original database.
-
- 8.6 Creating a new, working database from a CSV file
- --------------------------------------------------------
- There is an option button in the CSV options window called With field data
- which only becomes active when the With header button is selected. With the
- switch selected the header record of a saved CSV file contains not only the
- field names but also the field lengths and types. Again it is suggested that
- you create a file and look at it in Edit. Each item in the header begins
- with a number. This is the field length in characters, i.e. the maximum
- allowed length in the Powerbase record field. Next comes a ¤ character which
-
- separates the length from the field name. Another ¤ separates the field name
- from the concluding number which determines the field type. (The type
- numbers may be inspected by looking at the ValStrings file in the Powerbase
- directory.)
-
- A file created in this form is not meant to be dropped onto the window of an
- open database. It should be dropped onto the Powerbase icon on the iconbar
- when no database is open. Powerbase will attempt to convert the file into a
- functioning database. All the fields will be ranged on the left of the
- record window, one beneath another and the primary key will consist of the
- first four characters of the first field. Don’t try to create databases
- containing Check box, Button or External fields using this method; it’s only
- meant for the most basic type of database in which all fields are of the
- Editable class.
-
- - 47 -